A scalable Helmholtz solver in GRAPES over large-scale multicore cluster

نویسندگان

Linfeng Li

Wei Xue

Rajiv Ranjan

Zhiyan Jin

چکیده

This paper discusses performance optimization on the dynamical core of global numerical weather prediction model in Global/Regional Assimilation and Prediction System (GRAPES). GRAPES is a new generation of numerical weather prediction system developed and currently used by Chinese Meteorology Administration. The computational performance of the dynamical core in GRAPES relies on the efficient solution of threedimensional Helmholtz equations, which lead to large-scale and sparse linear systems formulated by the discretization in space and time. We choose generalized conjugate residual (GCR) algorithm to solve the corresponding linear systems and further propose algorithm optimizations for large-scale parallelism in two aspects: (i) reduction of iteration number for solution and (ii) performance enhancement of each GCR iteration. The reduction of iteration number is achieved by advanced preconditioning techniques, combining block incomplete LU factorization-k preconditioner over 7-diagonals of the coefficient matrix with the restricted additive Schwarz method effectively . The improvement for GCR iteration is to reduce the global communication operations by refactoring the GCR algorithm, which decreases the communication overhead over large number of cores. Performance evaluation on the Tianhe-1A system shows that the new preconditioning techniques reduce almost one-third iterations for solving the linear systems, the proposed methods can obtain 25% performance improvement on average compared with the original version of Helmholtz solver in GRAPES, and the speedup with our algorithms can reach 10 using 2048 cores compared with 256 cores. Copyright © 2013 John Wiley & Sons, Ltd.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures

Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published...

متن کامل

Delft University of Technology Report 11-01 a Scalable Helmholtz Solver Combining the Shifted Laplace Preconditioner with Multigrid Deflation

A Helmholtz solver whose convergence is parameter independent can be obtained by combining the shifted Laplace preconditioner with multigrid deflation. To proof this claim, we develop a Fourier analysis of a two-level variant of the algorithm proposed in [1]. In this algorithm those eigenvalues that prevent the shifted Laplace preconditioner from being scalable are removed by deflation using mu...

متن کامل

Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing block-...

متن کامل

Mixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver

In this study, turbulent flow around a tube bundle in non-orthogonal grid is simulated using the Large Eddy Simulation (LES) technique and parallelization of fully coupled Navier – Stokes (NS) equations. To model the small eddies, the Smagorinsky and a mixed model was used. This model represents the effect of dissipation and the grid-scale and subgrid-scale interactions. The fully coupled NS eq...

متن کامل

PSPIKE: A Parallel Hybrid Sparse Linear System Solver

The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse linear system solvers. These solvers must optimize parallel performance, processor (serial) performance, as well as memory requirements, while being robust across broad classes of applications and systems. In this paper, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Concurrency and Computation: Practice and Experience

دوره 25 شماره

صفحات -

تاریخ انتشار 2013

A scalable Helmholtz solver in GRAPES over large-scale multicore cluster

نویسندگان

چکیده

منابع مشابه

GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures

Delft University of Technology Report 11-01 a Scalable Helmholtz Solver Combining the Shifted Laplace Preconditioner with Multigrid Deflation

Efficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver

Mixed Large-Eddy Simulation Model for Turbulent Flows across Tube Bundles Using Parallel Coupled Multiblock NS Solver

PSPIKE: A Parallel Hybrid Sparse Linear System Solver

عنوان ژورنال:

اشتراک گذاری